Commitment Semantics for Sequential Decision Making under Reward Uncertainty

نویسندگان

  • Qi Zhang
  • Edmund H. Durfee
  • Satinder P. Singh
  • Anna Chen
  • Stefan J. Witwicki
چکیده

Cooperating agents can make commitments to help each other, but commitments might have to be probabilistic when actions have stochastic outcomes. We consider the additional complication in cases where an agent might prefer to change its policy as it learns more about its reward function from experience. How should such an agent be allowed to change its policy while still faithfully pursuing its commitment in a principled decision-theoretic manner? We address this question by defining a class of Dec-POMDPs with Bayesian reward uncertainty, and by developing a novel Commitment Constrained Iterative Mean Reward algorithm that implements the semantics of faithful commitment pursuit while still permitting the agent’s response to the evolving understanding of its rewards. We bound the performance of our algorithm theoretically, and evaluate empirically how it effectively balances solution quality and computation cost.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Trustworthy Fulfillment of Commitments

An agent that adopts a commitment to another agent should act so as to bring about a state of the world meeting the specifications of the commitment. Thus, by faithfully pursuing a commitment, an agent can be trusted to make sequential decisions that it believes can cause an intended state to arise. In general, though, an agent’s actions will have uncertain outcomes, and thus reaching an intend...

متن کامل

A POMDP Extension with Belief-dependent Rewards

Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability. Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce ρPOMDPs, an extension of POMDPs where the reward func...

متن کامل

Utilizing Decision Making Methods and Optimization Techniques to Develop a Model for International Facility Location Problem under Uncertainty

Abstract The purpose of this study is to consider an international facility location problem under uncertainty and present an integrated model for strategic and operational planning. The paper offers two methodologies for the location selection decision. First the extended VIKOR method for decision making problem with interval numbers is presented as a methodology for strategic evaluation of po...

متن کامل

Solving Sequential Decision-making Problems under Virtual Reality Simulation System

A large class of problems of sequential decision-making can be modeled as Markov or Semi-Markov Decision Problems, which can be solved by classical methods of dynamic programming. However, the computational complexity of the classical MDP algorithms, such as value iteration and policy iteration, is prohibitive and will grow intractably with the size of problems. Furthermore, they require for ea...

متن کامل

A New Balancing and Ranking Method based on Hesitant Fuzzy Sets for Solving Decision-making Problems under Uncertainty

The purpose of this paper is to extend a new balancing and ranking method to handle uncertainty for a multiple attribute analysis under a hesitant fuzzy environment. The presented hesitant fuzzy balancing and ranking (HF-BR) method does not require attributes’ weights through the process of multiple attribute decision making (MADM) under hesitant conditions. For the rating of possible alternati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016